Unpacking Multi-valued Symbolic Features and Classes in Memory-Based Language Learning

نویسندگان

  • Antal van den Bosch
  • Jakub Zavrel
چکیده

In supervised machine-learning applications to natural language processing, tasks are typically formulated as classiication tasks mapping multi-valued features to multi-valued classes. Memory-based or instance-based learning algorithms are suited for such representations , but they are not restricted to them; both features and classes may be unpacked in binary values. We demonstrate in a matrix of empirical tests on a range of natural language learning tasks that when using k = 1 in the k ? N N classiier kernel, binary unpacking of features and classes tends to be harmful to generalization accuracy. Unpacking features and classes causes the kernel classiier to rely on smaller sets of nearest neighbors, which generally leads to more misclassiications; only when the data is not sparse in the multi-valued case (when the average number of equidistant nearest neighbors is well above a handful), unpacking can lead to improved generalization accuracy. 1. Multi-valued versus Binary Features and Classes When a natural language processing task is formulated as a classiication task to be learned by a machine learning system, it is common to formulate it as a mapping from a (typically xed-length) vector of multi-valued symbolic features to a single multi-valued symbolic class. Features and classes typically represent positions in language strings, in which language items occur. Mappings from features to classes represent either symbol conversion tasks (e.g. from letters to phonemes, from ambiguous syntactic word classes to disambiguated classes) or segmentations of the input string represented in the features (e.g. boundaries between morphemes within words; boundaries between syntactic phrases in sentences) (Daelemans, 1995). Supervised machine learning systems tend to have particular biases towards the representation of features and classes they can handle in principle. For example, when representing a language processing task such as grapheme-phoneme conversion in a multi-layer feed-forward network trained with the back-propagation learning rule, some recoding is necessary from the basic multi-valued features (letters) and classes (phonemes and stress markers) into real-valued or binary input and output unit activation patterns (Sejnowski & Rosenberg, 1987; Dietterich et al., 1995). Arguably the simplest coding of multi-valued features and classes is local or binary coding, where each individual value of each feature or class is represented by a binary value detector, that has value 1 when the original individual value is present, and 0 otherwise. This paper deals with using this unpacked binary coding with memory-based learning. Memory-based learning is founded on the hypothesis that performance …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

A New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal

The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...

متن کامل

A New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal

The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...

متن کامل

The effect of verbal and visuospatial working memory spans on collocation processing in learners of English

Much interest has recently been directed toward the knowledge of collocations in the field of second language learning since they have been asserted to improve fluency. The current study was intended to examine the effect of verbal and visuospatial working memory spans on the processing of collocations using a Self-Pace Reading Task (SPRT) and relevant working memory tasks. To this end, partici...

متن کامل

The Symbolic Reflection of the Islamic Revolution in Islamic Countries

The Islamic Revolution of Iran has included symbolic aspects. The fundamental question of the article is that, what are the symbolic aspects of the Islamic Revolution of Iran and why the Islamic Republic of Iran has used these features to influence Islamic countries? Since the bearers of the word and the symbolic goods have the most legitimacy in social and political fields, this revolution has...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000